Improved noise-robustness in distributed speech recognition via perceptually-weighted vector quantisation of filterbank energies

نویسندگان

  • Stephen So
  • Kuldip K. Paliwal
چکیده

In this paper, we examine a coding scheme for quantising feature vectors in a distributed speech recognition environment that is more robust to noise. It consists of a vector quantiser that operates on the logarithmic filterbank energies (LFBEs). Through the use of a perceptually-weighted Euclidean distance measure, which emphasises the LFBEs that represent the spectral peaks, the vector quantiser codebook provides a priori knowledge of the spectral characteristics of clean speech and is used to quantise features from noise-corrupted speech. Our comparative results from the ETSI Aurora-2 recognition task show that the perceptually-weighted vector quantisation of LFBEs achieves higher recognition accuracies for noisy speech than the unweighted vector quantisation, memoryless and multi-frame GMM-based block quantisation and scalar quantisation of Mel frequency-warped cepstral coefficients.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the robustness of the usual FBE-based ASR front-end

All speech recognition systems require some form of signal representation that parametrically models the temporal evolution of the spectral envelope. Current parameterizations involve, either explicitly or implicitly, a set of energies from frequency bands which are often distributed in a mel scale. The computation of those filterbank energies (FBE) always includes smoothing of basic spectral m...

متن کامل

Spectral estimation and normalisation for robust speech recognition

Speech recognition in adverse conditions =mains a difficult but challenging problem. It is already shown [ 1) that normalisation of the dynamic range (SNR') of the kequency channels in a me1 scale triangular filterbank (MFCC) [2], improves the robustness against both additive and convolutional noise. Nevertheless, because the method is based on a masking-technique, the improvement is small in t...

متن کامل

An Investigation on the Use of i-Vectors for Robust ASR

In this paper we propose two different i-vector representations that improve the noise robustness of automatic speech recognition (ASR). The first kind of i-vectors is derived from “noise only” components of speech provided by an adaptive MMSE denoising algorithm, the second variant is extracted from mel filterbank energies containing both speech and noise. The effectiveness of both these repre...

متن کامل

Improving robustness to compressed speech in speaker recognition

The goal of this paper is to analyze the impact of codecdegraded speech on a state-of-the-art speaker recognition system and propose mitigation techniques. Several acoustic features are analyzed, including the standard Mel filterbank cepstral coefficients (MFCC), as well as the noise-robust medium duration modulation cepstrum (MDMC) and power normalized cepstral coefficients (PNCC), to determin...

متن کامل

MMSE estimation of log-filterbank energies for robust speech recognition

In this paper, we derive a minimum mean square error log-filterbank energy estimator for environment-robust automatic speech recognition. While several such estimators exist within the literature, most involve trade-offs between simplifications of the log-filterbank noise distortion model and analytical tractability. To avoid this limitation, we extend a well known spectral domain noise distort...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005